Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

BONN: Bayesian Optimized Binary Neural Network

TABLE 3.3

Performance contributions of the components in RBCNs

on CIFAR100, where Bi=Bi-Real Net, R=RBConv,

G=GAN, and B=update strategy.

Kernel Stage

R+G

R+G+B

RBCN

32-32-64-128

54.92

56.54

59.13

61.64

RBCN

32-64-128-256

63.11

63.49

64.93

65.38

RBCN

64-64-128-256

63.81

64.13

65.02

66.27

Note: The numbers in bold represent the best results.

3) We further improve RBCNs by updating the BN layers with W and C ﬁxed after

each epoch (line 17 in Algorithm 13). This further increases our accuracy by 2.51% (61.64%

vs. 59.13%) in CIFAR100 with 32-32-64-128.

3.7

BONN: Bayesian Optimized Binary Neural Network

First, we brieﬂy introduce Bayesian learning. Bayesian learning is a paradigm for construct-

ing statistical models based on the Bayes Theorem, providing practical learning algorithms

and helping us understand other learning algorithms. Bayesian learning shows its signiﬁ-

FIGURE 3.19

The evolution of the prior p(x), the distribution of the observation y, and the posterior

p(x|y) during learning, where x is the latent variable representing the full-precision param-

eters and y is the quantization error. Initially, the parameters x are initialized according

to a single-mode Gaussian distribution. When our learning algorithm converges, the ideal

case is that (i) p(y) becomes a Gaussian distribution N(0, ν), which corresponds to the

minimum reconstruction error, and (ii) p(x|y) = p(x) is a Gaussian mixture distribution

with two modes where the binarized values ˆx and −ˆx are located.